353 research outputs found

    Redundant disk arrays: Reliable, parallel secondary storage

    Get PDF
    During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant arrays of inexpensive disks (RAID) use simple redundancy schemes to provide high data reliability. The data encoding, performance, and reliability of redundant disk arrays are investigated. Organizing redundant data into a disk array is treated as a coding problem. Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures

    Structure-Aware Dynamic Scheduler for Parallel Machine Learning

    Full text link
    Training large machine learning (ML) models with many variables or parameters can take a long time if one employs sequential procedures even with stochastic updates. A natural solution is to turn to distributed computing on a cluster; however, naive, unstructured parallelization of ML algorithms does not usually lead to a proportional speedup and can even result in divergence, because dependencies between model elements can attenuate the computational gains from parallelization and compromise correctness of inference. Recent efforts toward this issue have benefited from exploiting the static, a priori block structures residing in ML algorithms. In this paper, we take this path further by exploring the dynamic block structures and workloads therein present during ML program execution, which offers new opportunities for improving convergence, correctness, and load balancing in distributed ML. We propose and showcase a general-purpose scheduler, STRADS, for coordinating distributed updates in ML algorithms, which harnesses the aforementioned opportunities in a systematic way. We provide theoretical guarantees for our scheduler, and demonstrate its efficacy versus static block structures on Lasso and Matrix Factorization

    High-Performance Distributed ML at Scale through Parameter Server Consistency Models

    Full text link
    As Machine Learning (ML) applications increase in data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Unfortunately, effective use of clusters for ML requires considerable expertise in writing distributed code, while highly-abstracted frameworks like Hadoop have not, in practice, approached the performance seen in specialized ML implementations. The recent Parameter Server (PS) paradigm is a middle ground between these extremes, allowing easy conversion of single-machine parallel ML applications into distributed ones, while maintaining high throughput through relaxed "consistency models" that allow inconsistent parameter reads. However, due to insufficient theoretical study, it is not clear which of these consistency models can really ensure correct ML algorithm output; at the same time, there remain many theoretically-motivated but undiscovered opportunities to maximize computational throughput. Motivated by this challenge, we study both the theoretical guarantees and empirical behavior of iterative-convergent ML algorithms in existing PS consistency models. We then use the gleaned insights to improve a consistency model using an "eager" PS communication mechanism, and implement it as a new PS system that enables ML algorithms to reach their solution more quickly.Comment: 19 pages, 2 figure

    Luminosity Evolution of Early-type Galaxies to z=0.83: Constraints on Formation Epoch and Omega

    Get PDF
    We present deep spectroscopy with the Keck telescope of eight galaxies in the luminous X-ray cluster MS1054-03 at z=0.83. The data are combined with imaging observations from the Hubble Space Telescope (HST). The spectroscopic data are used to measure the internal kinematics of the galaxies, and the HST data to measure their structural parameters. Six galaxies have early-type spectra, and two have "E+A" spectra. The galaxies with early-type spectra define a tight Fundamental Plane (FP) relation. The evolution of the mass-to-light ratio is derived from the FP. The M/L ratio evolves as \Delta log M/L_B \propto -0.40 z (Omega_m=0.3, Omega_Lambda=0). The observed evolution of the M/L ratio provides a combined constraint on the formation redshift of the stars, the IMF, and cosmological parameters. For a Salpeter IMF (x=2.35) we find that z_form>2.8 and Omega_m<0.86 with 95% confidence. The constraint on the formation redshift is weaker if Omega_Lambda>0: z_form>1.7 if Omega_m=0.3 and Omega_Lambda=0.7. At present the limiting factor in constraining z_form and Omega from the observed luminosity evolution of early-type galaxies is the poor understanding of the IMF. We find that if Omega_m=1 the IMF must be significantly steeper than the Salpeter IMF (x>2.6).Comment: To be published in ApJ Letters, Volume 504, September 1, 1998. 5 pages, 4 figure

    Approaches to Capacity Building for Machine Learning and Artificial Intelligence Applications in Health

    Get PDF
    Many health systems and research institutes are interested in supplementing their traditional analyses of linked data with machine learning (ML) and other artificial intelligence (AI) methods and tools. However, the availability of individuals who have the required skills to develop and/or implement ML/AI is a constraint, as there is high demand for ML/AI talent in many sectors. The three organizations presenting are all actively involved in training and capacity building for ML/AI broadly, and each has a focus on, and/or discrete initiatives for, particular trainees. P. Alison Paprica, Vector Institute for artificial intelligence, Institute for Clinical Evaluative Sciences, University of Toronto, Canada. Alison is VP, Health Strategy and Partnerships at Vector, responsible for health strategy and also playing a lead role in “1000AIMs” – a Vector-led initiative in support of the Province of Ontario’s \$30 million investment to increase the number of AI-related master’s program graduates to 1,000 per year within five years. Frank Sullivan, University of St Andrews Scotland. Frank is a family physician and an associate director of HDRUK@Scotland. Health Data Research UK \url{https://hdruk.ac.uk/} has recently provided funding to six sites across the UK to address challenging healthcare issues through use of data science. A 50 PhD student Doctoral Training Scheme in AI has also been announced. Each site works in close partnership with National Health Service bodies and the public to translate research findings into benefits for patients and populations. Yin Aphinyanaphongs – INTREPID NYU clinical training program for incoming clinical fellows. Yin is the Director of the Clinical Informatics Training Program at NYU Langone Health. He is deeply interested in the intersection of computer science and health care and as a physician and a scientist, he has a unique perspective on how to train medical professionals for a data drive world. One version of this teaching process is demonstrated in the INTREPID clinical training program. Yin teaches clinicians to work with large scale data within the R environment and generate hypothesis and insights. The session will begin with three brief presentations followed by a facilitated session where all participants share their insights about the essential skills and competencies required for different kinds of ML/AI application and contributions. Live polling and voting will be used at the end of the session to capture participants’ view on the key learnings and take away points. The intended outputs and outcomes of the session are: • Participants will have a better understanding of the skills and competencies required for individuals to contribute to AI applications in health in various ways • Participants will gain knowledge about different options for capacity building from targeted enhancement of the skills of clinical fellows, to producing large number of applied master’s graduates, to doctoral-level training After the session, the co-leads will work together to create a resource that summarizes the learnings from the session and make them public (though publication in a peer-reviewed journal and/or through the IPDLN website

    A Database of Cepheid Distance Moduli and TRGB, GCLF, PNLF and SBF Data Useful for Distance Determinations

    Full text link
    We present a compilation of Cepheid distance moduli and data for four secondary distance indicators that employ stars in the old stellar populations: the planetary nebula luminosity function (PNLF), the globular cluster luminosity function (GCLF), the tip of the red giant branch (TRGB), and the surface brightness fluctuation (SBF) method. The database includes all data published as of July 15, 1999. The main strength of this compilation resides in all data being on a consistent and homogeneous system: all Cepheid distances are derived using the same calibration of the period-luminosity relation, the treatment of errors is consistent for all indicators, measurements which are not considered reliable are excluded. As such, the database is ideal for inter-comparing any of the distance indicators considered, or for deriving a Cepheid calibration to any secondary distance indicator. Specifically, the database includes: 1) Cepheid distances, extinctions and metallicities; 2) apparent magnitudes of the PNLF cutoff; 3) apparent magnitudes and colors of the turnover of the GCLF (both in the V- and B-bands); 4) apparent magnitudes of the TRGB (in the I-band) and V-I colors at and 0.5 magnitudes fainter than the TRGB; 5) apparent surface brightness fluctuation magnitudes I, K', K_short, and using the F814W filter with the HST/WFPC2. In addition, for every galaxy in the database we give reddening estimates from DIRBE/IRAS as well as HI maps, J2000 coordinates, Hubble and T-type morphological classification, apparent total magnitude in B, and systemic velocity. (Abridged)Comment: Accepted for publication in the Astrophysical Journal Supplement Series. Because of space limitations, the figures included are low resolution bitmap images. Original figures can be found at http://www.astro.ucla.edu/~laura/pub.ht
    • …
    corecore